Locality-sensitive Hashing without False Negatives
نویسنده
چکیده
We consider a new construction of locality-sensitive hash functions for Hamming space that is covering in the sense that is it guaranteed to produce a collision for every pair of vectors within a given radius r. The construction is efficient in the sense that the expected number of hash collisions between vectors at distance cr, for a given c > 1, comes close to that of the best possible data independent LSH without the covering guarantee, namely, the seminal LSH construction of Indyk and Motwani (FOCS ’98). The efficiency of the new construction essentially matches their bound if cr = log(n)/k, where n is the number of points in the data set and k ∈ N, and differs from it by at most a factor ln(4) < 1.4 in the exponent for general values of cr. As a consequence, LSH-based similarity search in Hamming space can avoid the problem of false negatives at little or no cost in efficiency.
منابع مشابه
Locality-Sensitive Hashing Without False Negatives for l_p
In this paper, we show a construction of locality-sensitive hash functions without false negatives, i.e., which ensure collision for every pair of points within a given radius R in d dimensional space equipped with lp norm when p ∈ [1,∞]. Furthermore, we show how to use these hash functions to solve the c-approximate nearest neighbor search problem without false negatives. Namely, if there is a...
متن کاملMapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data
Locality Sensitive Hashing (LSH) has been proposed as an efficient technique for similarity joins for high dimensional data. The efficiency and approximation rate of LSH depend on the number of generated false positive instances and false negative instances. In many domains, reducing the number of false positives is crucial. Furthermore, in some application scenarios, balancing false positives ...
متن کاملOn fast bounded locality sensitive hashing
In this paper, we examine the hash functions expressed as scalar products, i.e., f(x) =< v, x >, for some bounded random vector v. Such hash functions have numerous applications, but often there is a need to optimize the choice of the distribution of v. In the present work, we focus on so-called anti-concentration bounds, i.e. the upper bounds of P [| < v, x > | < α]. In many applications, v is...
متن کاملFast indexing strategies for robust image hashes
Similarity preserving hashing can aid forensic investigations by providing means to recognize known content and modified versions of known content. However, this raises the need for efficient indexing strategies which support the similarity search. We present and evaluate two indexing strategies for robust image hashes created by the ForBild tool. These strategies are based on generic indexing ...
متن کاملHyperplane Arrangements and Locality-Sensitive Hashing with Lift
Locality-sensitive hashing converts high-dimensional feature vectors, such as image and speech, into bit arrays and allows high-speed similarity calculation with the Hamming distance. There is a hashing scheme that maps feature vectors to bit arrays depending on the signs of the inner products between feature vectors and the normal vectors of hyperplanes placed in the feature space. This hashin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016